Word Embeddings in Pytorch
a few quick notes about how to use embeddings in Pytorch and in deep learning programming in general.
we also need to define an index for each word when using embeddings.
単語はインデックス(整数)で表す
辞書 word_to_ix を使う
embeddings are stored as a ∣V∣×D matrix, where D is the dimensionality of the embeddings, such that the word assigned index iii has its embedding stored in the iii’th row of the matrix
two arguments: the vocabulary size, and the dimensionality of the embeddings.
上で述べた|V|とD
(from_pretrainedから読み込める!)
To index into this table, you must use torch.LongTensor (since the indices are integers, not floats).
code:python
>> # torch.nn.Embedding のドキュメントの例
>> import torch
>> import torch.nn as nn
>> torch.manual_seed(1)
>> embedding = nn.Embedding(10, 3)
>> inputs = torch.LongTensor(1,2,4,5], [4,3,2,9) # size (2, 4)
>> outputs = embedding(inputs)
>> outputs.size() # inputsの長さが2なので、Sizeのindex 0は2
>> outputs0.size() # 4つの単語(インデックス)について3次元のembeddingで表現しているから (4, 3)